ggplot2 is a very useful and used package for data
visualization in the R community. It implements the “Grammar of
Graphics” approach. There are plenty of useful free resources
online:
Let’s import the OpenPowerlifting dataset taken from Kaggle:
powerlift_data <- read.csv("./data/openpowerlifting2.csv", sep = ";")
str(powerlift_data)
## 'data.frame': 117463 obs. of 11 variables:
## $ Name : chr "Adrian Zwaan" "Aiden Westrip" "Andrew Fella" "Andrew Yuile" ...
## $ Sex : chr "M" "M" "M" "M" ...
## $ Event : chr "SBD" "SBD" "SBD" "SBD" ...
## $ Age : num 80 28 27 36 34 26 27 32 31 28 ...
## $ Bodyweight: num 82.1 82 89.2 79.5 114.7 ...
## $ Squat : num 100 228 260 125 270 ...
## $ Bench : num 72.5 135 140 77.5 180 ...
## $ Deadlift : num 145 250 250 142 270 ...
## $ Total : num 318 612 650 345 720 ...
## $ Federation: chr "GPC-AUS" "GPC-AUS" "GPC-AUS" "GPC-AUS" ...
## $ Date : chr "27/10/2018" "27/10/2018" "27/10/2018" "27/10/2018" ...
Our data set contains results and descriptive data from pretty strong people.
head(powerlift_data)
## Name Sex Event Age Bodyweight Squat Bench Deadlift Total
## 1 Adrian Zwaan M SBD 80 82.1 100.0 72.5 145.0 317.5
## 2 Aiden Westrip M SBD 28 82.0 227.5 135.0 250.0 612.5
## 3 Andrew Fella M SBD 27 89.2 260.0 140.0 250.0 650.0
## 4 Andrew Yuile M SBD 36 79.5 125.0 77.5 142.5 345.0
## 5 Anthony Provenza M SBD 34 114.7 270.0 180.0 270.0 720.0
## 6 Arian Behbehani M SBD 26 97.6 300.0 167.5 282.5 750.0
## Federation Date
## 1 GPC-AUS 27/10/2018
## 2 GPC-AUS 27/10/2018
## 3 GPC-AUS 27/10/2018
## 4 GPC-AUS 27/10/2018
## 5 GPC-AUS 27/10/2018
## 6 GPC-AUS 27/10/2018
Let’s explore the main components of a ggplot object (data, aesthetic mapping, geometries) by looking at the relationship between body weight and total score.
powerlift_data %>%
ggplot(aes(x = Bodyweight, y = Total)) +
geom_point() +
labs(x = "Bodyweight (Kg)", y = "Total Score (Kg)")
## Warning: Removed 11133 rows containing missing values (geom_point).
Let’s add some transparency:
powerlift_data %>%
ggplot(aes(x = Bodyweight, y = Total)) +
geom_point(alpha=0.05) +
labs(x = "Bodyweight (Kg)", y = "Total Score (Kg)")
## Warning: Removed 11133 rows containing missing values (geom_point).
Modifying aesthetics and adding layers is a piece of cake ! Let’s get some color maps on the federations:
powerlift_data %>%
ggplot(aes(x = Bodyweight, y = Total)) +
geom_point(aes(color = Federation)) +
labs(x = "Bodyweight (Kg)", y = "Total Score (Kg)")
## Warning: Removed 11133 rows containing missing values (geom_point).
Although an histogram is a better approach to count the number of powerlifters in each federation.
ggplot(powerlift_data, aes(Federation)) +
geom_bar() +
labs(x = "Federation", y = "Number of People")
Shall we arrange the bars by count?
federation_counts <- powerlift_data %>%
distinct(Name, .keep_all = TRUE) %>%
group_by(Federation) %>%
summarise(count = n()) %>%
arrange(desc(count))
federations_plot <- federation_counts |>
# filter(count > 500) |>
ggplot(aes(reorder(Federation, -count), count)) +
geom_bar(stat = "identity") +
labs(
x = "Federation",
y = "Number of People"
)
federations_plot
Let’s improve axis text visibility…
federations_plot +
theme(axis.text.x = element_text(angle = 90))
Even more..
fedplot3 <- federation_counts |>
filter(count > 500) |>
ggplot(aes(reorder(Federation, -count), count)) +
geom_bar(stat = "identity") +
labs(
x = "Federation",
y = "Number of People"
) +
theme(axis.text.x = element_text(angle = 90))
fedplot3
There is a wide range of customization options available in ggplot2. Axes, labels, titles, and legends can be easily modified. Use themes to create consistent and visually appealing visualizations
fedplot3 + theme_minimal()
Faceting is also an interesting approach very easy to follow in ggplot2 :
powerlift_data %>%
pivot_longer(cols = c("Squat", "Bench", "Deadlift"), names_to = "exercise") %>%
ggplot(aes(x=Bodyweight, y=value))+
geom_point()+
facet_grid(.~exercise)
## Warning: Removed 28422 rows containing missing values (geom_point).
powerlift_data %>%
pivot_longer(cols = c("Squat", "Bench", "Deadlift"), names_to = "exercise") %>%
filter(value > 0) %>%
ggplot(aes(x=Bodyweight, y=value))+
geom_point(alpha = 0.05)+
facet_grid(.~exercise) +
labs(y="Score (Kg)")
## Warning: Removed 8202 rows containing missing values (geom_point).
What if we need some stats layers?
powerlift_data %>%
pivot_longer(cols = c("Squat", "Bench", "Deadlift"), names_to = "exercise") %>%
filter(value > 0) %>%
ggplot(aes(x=Bodyweight, y=value))+
geom_point(alpha = 0.05)+
geom_smooth()+
facet_grid(.~exercise) +
labs(y="Score (Kg)")
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Removed 8202 rows containing non-finite values (stat_smooth).
## Warning: Removed 8202 rows containing missing values (geom_point).
Another example of aesthetic manipulation: squat vs. becnh press performance for a given Federation.
powerlift_data %>%
filter(Federation =="NASA" & Squat > 0 & Bench > 0) %>%
ggplot(aes(x=Bench, y=Squat))+
geom_point(aes(size=Age))
## Warning: Removed 2657 rows containing missing values (geom_point).
ggplot2 is primarily designed for static visualizations.
It’s tricky incorporating interactive elements using ggplot2 alone.
Although the magic ggplotly function deserves attention.
plotly::ggplotly()
When multiple comparisons and iterations are needed, ggplot may be used in combination of Shiny to optimize interactions.
powerlift_data %>%
filter(Squat > 0 & Bench > 0) %>%
ggplot(aes(x=Bench, y=Squat))+
geom_point()+
facet_wrap("Federation")